Action Recognition using Visual Attention

نویسندگان

  • Shikhar Sharma
  • Ryan Kiros
  • Ruslan Salakhutdinov
چکیده

We propose a soft attention based model for the task of action recognition in videos. We use multi-layered Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units which are deep both spatially and temporally. Our model learns to focus selectively on parts of the video frames and classifies videos after taking a few glimpses. The model essentially learns which parts in the frames are relevant for the task at hand and attaches higher importance to them. We evaluate the model on UCF-11 (YouTube Action), HMDB-51 and Hollywood2 datasets and analyze how the model focuses its attention depending on the scene and the action being performed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Transfer from action to perception: The effect of motor-perceptual enrichment

This study investigated the effect of audiovisual integration on action-perception transfer.40 subjects were randomly divided four groups: visual, visual-auditory, control visual and control visual-auditory. Visual groups watched pattern skilled basketball player and other groups in addition to watching pattern skilled basketball player, heard Elbow angular velocity as sonification. In first st...

متن کامل

Action Classification and Highlighting in Videos

Inspired by recent advances in neural machine translation, that jointly align and translate using encoder-decoder networks equipped with attention, we propose an attentionbased LSTM model for human activity recognition. Our model jointly learns to classify actions and highlight frames associated with the action, by attending to salient visual information through a jointly learned soft-attention...

متن کامل

Online learning of task-driven object-based visual attention control

We propose a biologically-motivated computational model for learning task-driven and object-based visual attention control in interactive environments. In this model, top-down attention is learned interactively and is used to search for a desired object in the scene through biasing the bottom-up attention in order to form a need-based and object-driven state representation of the environment. O...

متن کامل

Cortex-inspired Recurrent Networks for Developmental Visual Attention and Recognition

Cortex-inspired Recurrent Networks for Developmental Visual Attention and Recognition By Matthew Luciw It is unknown how the brain self-organizes its internal wiring without a holisticallyaware central controller. How does the brain develop internal object representations for a massive number of objects? How do such representations enable tightly intertwined attention and recognition in the pre...

متن کامل

Early Posterior Negativity as Facial Emotion Recognition Index in Children With Attention Deficit Hyperactivity Disorder

Introduction: Studies indicate that children with Attention Deficit Hyperactivity Disorder (ADHD) have deficits in social and emotional functions. It can be hypothesized that these children have some deficits in early stages of facial emotion discrimination. Based on this hypothesis, the present study investigated neural correlates of early visual processing during emotional face recognition in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1511.04119  شماره 

صفحات  -

تاریخ انتشار 2015